Honolulu County
VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks
As the adoption of large language models increases and the need for per-user or pertask model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and its variants, incur substantial storage and transmission costs. To further reduce stored parameters, we introduce a "divideand-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules, and layers by sharing parameters globally via a vector bank. As an instantiation of the paradigm to LoRA, our proposed VB-LoRA composites all the low-rank matrices of LoRA from a shared vector bank with a differentiable top-k admixture module. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods. Extensive experiments demonstrate the effectiveness of VB-LoRA on natural language understanding, natural language generation, instruction tuning, and mathematical reasoning tasks. When fine-tuning the Llama2-13B model, VB-LoRA only uses 0.4% of LoRA's stored parameters, yet achieves superior results.
Scalable Vision Language Model Training via High Quality Data Curation
Dong, Hongyuan, Kang, Zijian, Yin, Weijie, Liang, Xiao, Feng, Chao, Ran, Jiao
In this paper, we introduce SAIL-VL (ScAlable Vision Language Model TraIning via High QuaLity Data Curation), an open-source vision language model (VLM) of state-of-the-art (SOTA) performance with 2B parameters. We introduce three key improvements that contribute to SAIL-VL's leading performance: (1) Scalable high-quality visual understanding data construction: We implement a visual understanding data construction pipeline, which enables hundred-million-scale high-quality recaption data annotation. Equipped with this pipeline, we curate SAIL-Caption, a large-scale caption dataset with large quantity and the highest data quality compared with opensource caption datasets. (2) Scalable Pretraining with High-Quality Visual Understanding Data: We scale SAIL-VL's pretraining budget up to 131B tokens and show that even a 2B VLM benefits from scaled up training data sizes, exhibiting expected data size scaling laws in visual understanding and instruction following performance. (3) Scalable SFT via quantity and quality scaling: We introduce general guidance for instruction data curation to scale up instruction data continuously, allowing us to construct a large SFT dataset with the highest quality. To further improve SAIL-VL's performance, we propose quality scaling, a multi-stage training recipe with curriculum learning, to improve model performance scaling curves w.r.t. data sizes from logarithmic to be near-linear. SAIL-VL obtains the highest average score in 19 commonly used benchmarks in our evaluation and achieves top1 performance among VLMs of comparable sizes on OpenCompass (https://rank.opencompass.org.cn/leaderboard-multimodal). We release our SAIL-VL-2B model at HuggingFace (https://huggingface.co/BytedanceDouyinContent/SAIL-VL-2B).
Deep Learning Predicts Mammographic Breast Density in Clinical Breast Ultrasound Images
Bunnell, Arianna, Valdez, Dustin, Wolfgruber, Thomas K., Quon, Brandon, Hung, Kailee, Hernandez, Brenda Y., Seto, Todd B., Killeen, Jeffrey, Miyoshi, Marshall, Sadowski, Peter, Shepherd, John A.
Background: Breast density, as derived from mammographic images and defined by the American College of Radiology's Breast Imaging Reporting and Data System (BI-RADS), is one of the strongest risk factors for breast cancer. Breast ultrasound (BUS) is an alternative breast cancer screening modality, particularly useful for early detection in low-resource, rural contexts. The purpose of this study was to explore an artificial intelligence (AI) model to predict BI-RADS mammographic breast density category from clinical, handheld BUS imaging. Methods: All data are sourced from the Hawaii and Pacific Islands Mammography Registry. We compared deep learning methods from BUS imaging, as well as machine learning models from image statistics alone. The use of AI-derived BUS density as a risk factor for breast cancer was then compared to clinical BI-RADS breast density while adjusting for age. The BUS data were split by individual into 70/20/10% groups for training, validation, and testing. Results: 405,120 clinical BUS images from 14.066 women were selected for inclusion in this study, resulting in 9.846 women for training (302,574 images), 2,813 for validation (11,223 images), and 1,406 for testing (4,042 images). On the held-out testing set, the strongest AI model achieves AUROC 0.854 predicting BI-RADS mammographic breast density from BUS imaging and outperforms all shallow machine learning methods based on image statistics. In cancer risk prediction, age-adjusted AI BUS breast density predicted 5-year breast cancer risk with 0.633 AUROC, as compared to 0.637 AUROC from age-adjusted clinical breast density. Conclusions: BI-RADS mammographic breast density can be estimated from BUS imaging with high accuracy using a deep learning model. Furthermore, we demonstrate that AI-derived BUS breast density is predictive of 5-year breast cancer risk in our population.
Semi-Markovian Planning to Coordinate Aerial and Maritime Medical Evacuation Platforms
Al-Husseini, Mahdi, Wray, Kyle H., Kochenderfer, Mykel J.
The transfer of patients between two aircraft using an underway watercraft increases medical evacuation reach and flexibility in maritime environments. The selection of any one of multiple underway watercraft for patient exchange is complicated by participating aircraft utilization history and a participating watercraft position and velocity. The selection problem is modeled as a semi-Markov decision process with an action space including both fixed land and moving watercraft exchange points. Monte Carlo tree search with root parallelization is used to select optimal exchange points and determine aircraft dispatch times. Model parameters are varied in simulation to identify representative scenarios where watercraft exchange points reduce incident response times. We find that an optimal policy with watercraft exchange points outperforms an optimal policy without watercraft exchange points and a greedy policy by 35% and 40%, respectively. In partnership with the United States Army, we deploy for the first time the watercraft exchange point by executing a mock patient transfer with a manikin between two HH-60M medical evacuation helicopters and an underway Army Logistic Support Vessel south of the Hawaiian island of Oahu. Both helicopters were dispatched in accordance with our optimized decision strategy.
Evaluating Large Language Models for Anxiety and Depression Classification using Counseling and Psychotherapy Transcripts
Sun, Junwei, Ma, Siqi, Fan, Yiran, Washington, Peter
University of Hawaii at Manoa, Honolulu, HI, USA *Correspondence should be sent to: pyw@hawaii.edu These authors contributed equally to this work. Abstract We aim to evaluate the efficacy of traditional machine learning and large language models (LLMs) in classifying anxiety and depression from long conversational transcripts. We fine-tuned both established transformer models (BERT, RoBERTa, Longformer) and more recent large models (Mistral-7B), trained a Support Vector Machine with feature engineering, and assessed GPT models through prompting. We observe that state-ofthe-art models fail to enhance classification outcomes compared to traditional machine learning methods.
VB-LoRA: Extreme Parameter Efficient Fine-Tuning with Vector Banks
Li, Yang, Han, Shaobo, Ji, Shihao
As the adoption of large language models increases and the need for per-user or per-task model customization grows, the parameter-efficient fine-tuning (PEFT) methods, such as low-rank adaptation (LoRA) and its variants, incur substantial storage and transmission costs. To further reduce stored parameters, we introduce a "divide-and-share" paradigm that breaks the barriers of low-rank decomposition across matrix dimensions, modules and layers by sharing parameters globally via a vector bank. As an instantiation of the paradigm to LoRA, our proposed VB-LoRA composites all the low-rank matrices of LoRA from a shared vector bank with a differentiable top-$k$ admixture module. VB-LoRA achieves extreme parameter efficiency while maintaining comparable or better performance compared to state-of-the-art PEFT methods. Extensive experiments demonstrate the effectiveness of VB-LoRA on natural language understanding, natural language generation, and instruction tuning tasks. When fine-tuning the Llama2-13B model, VB-LoRA only uses 0.4% of LoRA's stored parameters, yet achieves superior results. Our source code is available at https://github.com/leo-yangli/VB-LoRA.
The impact of spatio-temporal travel distance on epidemics using an interpretable attention-based sequence-to-sequence model
Jiang, Yukang, Tian, Ting, Xie, Huajun, Guo, Hailiang, Wang, Xueqin
Amidst the COVID-19 pandemic, travel restrictions have emerged as crucial interventions for mitigating the spread of the virus. In this study, we enhance the predictive capabilities of our model, Sequence-to-Sequence Epidemic Attention Network (S2SEA-Net), by incorporating an attention module, allowing us to assess the impact of distinct classes of travel distances on epidemic dynamics. Furthermore, our model provides forecasts for new confirmed cases and deaths. To achieve this, we leverage daily data on population movement across various travel distance categories, coupled with county-level epidemic data in the United States. Our findings illuminate a compelling relationship between the volume of travelers at different distance ranges and the trajectories of COVID-19. Notably, a discernible spatial pattern emerges with respect to these travel distance categories on a national scale. We unveil the geographical variations in the influence of population movement at different travel distances on the dynamics of epidemic spread. This will contribute to the formulation of strategies for future epidemic prevention and public health policies.
#ICML2023 invited talk: Shakir Mohamed on ML with social purpose
The 40th International Conference on Machine Learning (ICML) took place Honolulu, Hawai'i from 23-29 July 2023. There were four invited talks as part of the programme, and in this post we summarise the presentation by Shakir Mohamed โ "Machine learning with social purpose". In a talk of three interwoven parts, Shakir's aim was to encourage the amplification and acceleration of work on machine learning with social purpose. He is passionate about using machine learning to contribute to overcoming some of the global challenges that we face, and, as well as demonstrating some of his research in this space, he provided guidance on how researchers can widen their horizons and consider the social implications of their work. Modelling of weather and climate can have a big impact on society, with such models often providing the basis for decisions taken by policy makers.
#ICML2023 tweet round-up
The 40th International Conference on Machine Learning (ICML) took place last week in Honolulu, Hawaiสปi. As well as four invited talks, the programme boasted oral and poster presentations, affinity events, tutorials and workshops. Find out what the participants got up to over the course of the conference. Can't wait for our first invited speaker talks by the inimitable @MarzyehGhassemi and @shakir_za on Tuesday! pic.twitter.com/tNDi7RNIUt Amazing group from #LatinXinAI hiking the Makapu'u Point Lighthouse Trail to kick off our social events at @icmlconf #ICML2023 @_LXAI pic.twitter.com/cO6dKAz6x8
Congratulations to the #ICML2023 outstanding paper award winners
This year's International Conference on Machine Learning (ICML) is taking place in Honolulu, Hawai'i from 23-29 July. The winners of the outstanding paper awards for 2023 have now been announced. This paper introduces an interesting approach that aims to address the challenge of obtaining a learning rate free optimal bound for non-smooth stochastic convex optimization. The authors propose a novel method that overcomes the limitations imposed by traditional learning rate selection in optimizing such problems. This research makes a valuable and practical contribution to the field of optimization.